39 research outputs found

    Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

    Full text link
    We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision

    Learning of Identity from Behavioral Biometrics for Active Authentication

    Get PDF
    In this work, we look into the problem of active authentication on desktop computers and mobile devices. Active authentication is the process of continuously verifying a person's identity based on the cognitive, behavioral, and physical aspects of their interaction with the device. In this work, we consider several representative modalities including keystroke dynamics, mouse movement, application usage patterns, web browsing behavior, GPS location, and stylometry. We implement a binary classifer for each modality and organize the classifers as a parallel binary decision fusion architecture. The decisions of each classifer are fed into a decision fusion center (DFC) which applies the Chair-Varshney fusion rule to generate a global decision. The DFC minimizes the probability of error using estimates of each local classifer's false rejection rate (FAR) and false acceptance rate (FRR). We test our approach on two large datasets of 67 desktop computer users and 200 mobile device users. We are able to characterize the performance of the system with respect to intruder detection time and to quantify the contribution of each modality to the overall performance.Ph.D., Computer Engineering -- Drexel University, 201
    corecore